🎯 What We'll Cover
Lesson A was about driving the tool. Lesson B is the reason the whole track exists: using Claude Code to make your research genuinely reproducible — the kind of work a stranger, or you in two years, can open up, inspect, and repeat.
This first page draws the distinction that the rest of the lesson hangs on: reproducibility is not the same as verification, and the two demand different things. Then it lays out the structure that makes reproducibility possible — the project folder as the unit of trustworthy research — and gives you a scaffold you can copy for your own work.
The two pages that follow turn that structure into practice: a CLAUDE.md that enforces good habits, pre-registration, and reusable Skills (B.2); then a full worked analysis from messy data to a reproducible result, how to verify it, and how the folder itself becomes your disclosure (B.3).
⚖️ Reproducibility Is Not Verification
The course has taught verification thoroughly — Week 9 was largely about it. Verification asks a question about an output: is this correct? Did the model hallucinate the citation, get the analysis right, reason soundly? Reproducibility asks a different question, about a process: could someone else inspect what I did and repeat it? They are siblings, and they are not the same. A result can be correct but irreproducible (you got the right answer but cannot say how), and a process can be perfectly reproducible but wrong (anyone can repeat your flawed analysis exactly). Good research needs both.
Agentic work cuts both ways here, and it is worth being honest about both edges. It makes reproducibility harder, because the agent does a great deal autonomously and the steps can be opaque — this is Mollick's “wizard” problem from Week 11.1, where competence and opacity rise together: the more the agent does for you, the less you watched it do. But it also makes reproducibility easier in a way manual work never managed, because the agent can be instructed to document as it goes — to log every decision, name every source, and save every script, tirelessly, without the human tendency to think “I'll write that up later.”
💡 The claim this lesson makes good on
Used with discipline, Claude Code can produce research that is more reproducible than typical manual work — not less. The reason is simple: a human researcher documents their decisions when they remember to and have time; an agent told to log every consequential choice does it every time, by default. The opacity is real, but it is a problem you solve by instructing the agent to leave a trail, and the rest of this lesson is how.
📁 The Project Folder as the Unit of Reproducible Work
Lesson A's organising principle was the chat is not the archive. Its positive form is this: the project folder is the unit of reproducible work. Everything that matters — the raw data, the code, the outputs, the decisions, the standing instructions, the version history — lives in one inspectable place. Here is what a reproducible version of the Berg River project looks like once it is set up properly:
berg-river-microplastics/ CLAUDE.md # the standing instructions (B.2) data/ raw/ # the original data — never edited processed/ # cleaned data, regenerable from raw + scripts scripts/ # the code, preserved and re-runnable outputs/ # figures, tables, reports notes/ decision-log.md # every consequential choice, dated, with reasons docs/ data-inventory.md # what the data actually is pre-registrations/ # predictions + decision rules, before the compute (B.2)
Each part earns its place by answering a question a replicator would ask:
data/raw/ — never edited
The original data, treated as immutable. Cleaning produces new files in data/processed/; the raw stays untouched so the whole chain can always be rebuilt from the source. The CLAUDE.md will forbid the agent from editing anything in here.
scripts/ — the method, preserved
Every cleaning and analysis step is a saved script, not a one-off action in a chat. “Re-run the analysis” becomes a command, not a memory. This is the difference between a result you can regenerate and one you merely once obtained.
outputs/ — derived, separate
Figures, tables, and reports live apart from the data and the code that made them, so it is always clear what is source, what is method, and what is product. Nothing in outputs/ is hand-edited — if it's wrong, you fix the script and regenerate.
notes/decision-log.md — the reasons
The single most valuable file for reproducibility. Which outliers were excluded and why; how missing values were handled; which test was chosen. The decisions a reader most needs and most often cannot find. The agent appends to it as it works.
docs/data-inventory.md — what the data is
A plain description of each file, its variables, units, and known problems. The thing you wish every dataset you inherited had come with. Usually the first thing the agent writes, after inspecting the raw data.
pre-registrations/ — the commitments
Where you write down what you expect to find, and the rule for deciding, before you run anything. This is the strongest single guard against fooling yourself, and it gets its own section in B.2.
⚖️ Remember: discipline proportionate to stakes
This is the full apparatus, for work whose results have to survive, be repeated, or be defended — the analysis behind a paper figure, say. It is deliberately heavy. As Lesson A.2 argued, you apply as much of it as the task in front of you deserves: an exploratory afternoon might need only a data-inventory.md and a saved script; the headline analysis of your thesis should have all of it. The structure here is what “good” looks like at the high-stakes end, not a ritual for every interaction.
📥 Download: the reproducible-project scaffold
An empty version of the structure above — the folders, a starter CLAUDE.md, and empty decision-log.md and data-inventory.md templates — ready to copy into your own project and adapt. reproducible-project-scaffold.zip
📚 Foundations and further reading
None of this discipline is new with AI; agentic tools mainly make it easier to practise. It draws on two established literatures worth knowing in their own right:
- Why reproducibility matters. That a great deal of published research does not hold up is argued in Ioannidis, Why Most Published Research Findings Are False (2005, PLoS Medicine); the Open Science Collaboration's mass replication of psychology studies (2015, Science); and Baker's survey of 1,500 researchers (2016, Nature).
- How to do it in practice. The project-folder structure above is essentially an implementation of Sandve et al., Ten Simple Rules for Reproducible Computational Research (2013, PLoS Computational Biology) and Wilson et al., Good Enough Practices in Scientific Computing (2017, PLoS Computational Biology) — both short, practical, and worth reading in full.
Coming up in B.2: the structure is the skeleton; the habits are what bring it to life. Next we write the CLAUDE.md that makes the agent keep the raw data sacred and the decision log current — the headline artefact of this track — then add pre-registration and reusable Skills.